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H ave We Witnessed 


a Real-Life 


Turing Test? 


Did Deep Blue ace the Turing Test? 


Did it do much more? It seems that 


the IBM creation not only beat the 


reigning World Champion Gary 


BM 's announcement of its intention to “retire” the 
Deep Blue computer from chess revived the interest of 
both themass media and the general public in the chess 
match between World Champion Gary Kasparov and 
Deep Blue. In a sense, by refusing a rematch with 
Kasparov or anew match with another grand master, 
IBM closes a chapter in the history of artificial intelli- 
gence. Al researchers had long investigated building a 
machine that could defeat a world-class chess cham- 
pion, and now one had. But what did this mean? 


DIFFERING INTERPRETATIONS 

One interpretation, the mass media's, held that the 
match result (D eep Blue 3 1/2 to Gary Kasparov 2 1/2) 
definitively proved the computer as intellectually supe 
rior to a human in a field previously considered the 
exclusive domain of human intelligence: “those who 
had looked to Gary Kasparov as the last hope could 
now only bemoan thecoming days of ascendant com- 
puters.” t Technical publications devoted their atten- 
tion to a description of hardware-software synergy, 
computer algorithms, strategy, methods of numerical 
evaluation of positions, and so on.23 

For Al professionals, a computer defeating a human 
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Kasparov, but also took a large step, 
in some people’s eyes, toward true 


artificial intelligence. 


in chess is probably neither surprising nor really sig- 
nificant. A fter all, they contend, chess can be described 
in terms of a nondeterministic alternating Turing 
machine? Despite the enormous number of possible 
positions and available moves (there are 10+? possible 
chess games by research mathematician Claude 
Shannon’s estimate), the task does not present a chal- 
lenging theoretical Al problem of N P-completeness.5 
There are many well-developed Al strategies that limit 
the search for the best moveto an analysis of the most 
promising positions. Therefore, the progress in logical 
and numerical methods of Al and a computer’s com- 
putational speed and available memory made the com- 
puter’s victory inevitable. Deep Blue’s victory, then, 
was attributable to its ability to analyze 200 million 
positions per second and a refined algorithm that 
accounted for positional—in addition to material— 
advantage. In summary, most Al professionals con- 
clude that the computer won by brute force, rather 
than a sophisticated or original Al strategy. 

What most AI experts have overlooked, though, is 
another aspect of the match, which may signify a mile- 
stone in the history of computer science: For the first 
time, acomputer seems to have passed the Turing Test. 
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Table 1. Scores of participants in the 1998 Loebner Prize competition. Alower score indicates responses judged most human. 


Respondents Individual scores of the 10 judges Median scores Average scores 
Human 5 ee ee A te a ae, eR 2, mae 
Human 2 4 3 4 2 1 ie a 4 5 4 Sm 
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Does the ability to play the highest level chess prove 
the existence of intelligence in a computer? Alan 
Turing, British mathematician and one of thefounders 
of computer science, considered chess to be a begin- 
ning step in the process of programming computers 
to be actually intelligent: 


We may hope that machines will eventually compete 
with men in all purely intellectual fields. But which 
are the best ones to start with? Even this is a difficult 
decision. M any people think that an abstract activ- 
ity, like the playing of chess, would be best.® 


In the same article, written in 1950, he asked 
whether machines are capable of thinking. He 
answered “yes,” but the central question that remained 
was how to determine if a computer could think. 
Turing suggested that if the responses from the com- 
puter wereindistinguishablefrom that of ahuman, we 
could say that the computer was thinking. 

The Turing Test consists of the following scenario: 
An interviewer (sitting in a separate room) asks a series 
of questions that arerandomly directed to either a com- 
puter or a person. Based on the answers, the inter- 
viewer must distinguish which of thetwo has answered 
the question. If the interviewer is not able to distin- 
guish between them, then the computer is intelligent. 


Loebner Prize 

The Loebner Prize is the first formal acknowledg- 
ment of the Turing Test.” H ugh Loebner, N ew York 
philanthropist, and the Cambridge Center for 
Behavioral Studies (Cambridge, M ass.) established the 
Loebner Prize Competition in Artificial Intelligence in 
1990. Loebner pledged a prize of $100,000 for the 
first computer whose responses were indistinguishable 
from those of a human. 

The Computer M useum of Boston hosted the first 
Loebner Prize competition of computer programs in 
November 1991. Each year since, Loebner has 
awarded a medal and $2,000 to the designer of the 
computer system that is the best entry relative to other 
entries that year, irrespective of its absolute success in 
passing the Turing Test. In accordance with the 
requirements stipulated by Loebner, the grand prize 
winner must deal with audiovisual input. 
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An awards committee admits three to six programs 
to the contest based on an initial screening, and a panel 
of five to 10 judges evaluates them. Initially, the 
awards committee selected the judges from the gen- 
eral public. For the 1993 competition, thejudges were 
reporters from major US publications, a much less 
docile and cooperative group of questioners. Each 
judge had a chance to communicate with each pro- 
gram. For some years, the contest constrained ques- 
tions to a single narrow topic— pets, for example. 


1998 competition 

The most recent competition in 1998 did not limit 
the scope of the questioning. Also, the panel of judges 
came from a variety of backgrounds, including jour- 
nalism, philosophy, education, computer science, and 
social work. The 1998 competition took place at 
Australia’s PowerH ouse M useum in Sydney, in con- 
junction with international natural language and com- 
putational linguistics conferences hosted by Flinders 
University. The four human respondents chosen to 
converse with the judges via computer terminals 
included a 71-year-old, a journalist, a teacher, and a 
primary school student. T he judges ranked the respon- 
siveness of both humans and computer programs, 
using a scale where 1 indicated most human and 10, 
least human. Table 1 shows the judges’ individual 
scores and lists the respondents in order of those 
judged most human first. 

Significantly, the difference between the most 
human computer and the most computer-like human 
was only 0.6 of a point. The “Conversing with 
Computers” sidebar gives a sample of the repartee 
between the second judge and the computer judged 
the most human overall. As shown in Table 1, thesec- 
ond judge gave the computer a score of three, the most 
human score awarded to this particular machine. 

In judging how successfully computers can imitate 
humans, we should also consider whether humans can 
disguise themselves as computers (according to the 
contest transcripts, some human respondents defi- 
nitely attempted to do so in the Loebner Prize con- 
test). So far in these contests, no computer has given 
responses totally indistinguishable from a human, 
which would win the $100,000 grand prize award. 
Also, judges, as a group, have not mistaken a human 


Conversing with Computers 


In this limited sample, Judge 2 is con- 
versing with the computer program that 
won the 1998 Loebner Prize competition. 
The full transcript is available at 
http://www.phm.gov.au/whatson/pcl.htm. 

Judge: Greetings, I’m judge 2. 

Program: |’m not sure what to talk 
about with you. Are you a girl or a boy? 


Judge: Girl. How are you? 
Program: Are you married? |’m doing 
just fine, thanks. 


Judge: Would you like to talk about 
something interesting? 

Program: MP is an abbreviation for 
M ember of Parliament. 


Judge: Right, why did you say that? 
Program: | just felt like it. 


Judge: O kay, did you hit a wrong but- 
ton? 

Program: Do you come from around 
here? 


Judge: Yes, do you? 
Program: N o, | don't. 


Judge: W here do you come from? 
Program: | am from Earth. 


Judge: Well, that’s very specific indeed. 
What are your interests? 

Program: Would you like to have a 
drink with me later? L et’s not get too per- 
sonal. 


Judge: Aren't you a M onty Python fan? 
Program: Yes. | am a big Python fan. 


Judge: Then what's the counter quote? 
Program: | can’t remember offhand. 


Judge: It’s “N obody expects the Spanish 
inquisition!” Now how about my five- 
minute argument? 

Program: And now for something com- 
pletely different. Are you fond of cats? 


Judge: No, but I'd like to buy some 
cheese. 

Program: Have you seen any good 
movies lately? 


Judge: Yes, saw the] ames Bond one a 
few weeks ago. What about you? 

Program: |'ve been watching a lot of stuff 
on home video. It just gets harder and 
harder to get out to a theatre somehow. | 
didn’t even see Star Trek: First Contact in 
the theater; I’m waiting for it to come out on 
tape. 


for acomputer. On the basis of both median and aver- 
age scores in a Turing Test, it appears the judges can 
clearly distinguish humans from computers. 


Areal test of intelligence or merely PR? 

There is, however, controversy surrounding the 
Loebner Prize. Some opponents challenge the idea of 
the Turing Test as an adequate test of intelligence 
because it relies solely on the ability to fool people. 
Stuart M. Shieber, Gordon McKay Professor of 
Computer Science, H arvard University, argues “that 
the competition has no clear purpose, that its design 
prevents any useful outcome, and that such a compe- 
tition is inappropriate given the current level of tech- 
nology.” ® N ed Block, a professor of philosophy at 
M IT, has argued that the Turing Test is a sorely inad- 
equate test of intelligence becauseit relies solely on the 
ability to fool people? Marvin Minsky, Toshiba 
Professor of Media Arts and Sciences at MIT, 
expressed his skepticism toward the competition by 
proposing a counter prize of $100 to the person who 
could persuade Loebner to end his contest. 
N evertheless, in all Loebner Prize competitions some 
judges have mistaken computers for humans, and con- 
versely some have mistaken humans for computers. 

M ost of the criticism camefrom the artificial intel- 
ligence community. Indeed, most of the tricks that 
worked had nothing to do with Al methods, but were 
rather manipulations of subtle language techniques: 


e the repetition of previous statements verbatim 
(Subject to pronominal adjustments); 
e answering by repeating a judge's sentence, with 


pronouns transposed, which is preceded by the 
introductory “Why do you need to tell me”; 

e asking, “Why do you ask that?” which, in effect, 
changes the level and/or topic of conversation. 


Besides, predictably enough, the judges’ education, 
age, and, most importantly, computer literacy and 
awareness played a pivotal rolein the judges’ ability 
to evaluate contestants. 

From my point of view, the competition under- 
scored the multifaceted characteristics of human per- 
sonality and how the general public's view of human 
intelligence differs from that of scientists. An average 
citizen tends to pay more attention to the social aspects 
of communication, while scientists put more empha- 
sis on a human's logical ability. The Al community 
tends to evaluate programs not by final results, but 
rather by thecomplexity of internal algorithms. W hen 
it turned out that these complex methods were not up 
to the challenge of passing the Turing Test, then some 
members of the Al community attempted to reject the 
competition along with the whole idea of the Turing 
Test. In response, and along with the progress 
achieved by software developers, contest administra- 
tors are gradually eliminating restrictions in the 
Loebner competition. They are attempting to bring 
the contest closer to the Al community and to com- 
bine it with scientific conferences. 


A REAL-LIFE TURING TEST 

H owever, it was neither the complexity of an algo- 
rithm nor the power of the computer that made D eep 
Blue’s match victory so remarkable. It was Gary 


March 1999 


“I have no idea 
what’s happening 
behind the curtain. 

Gary Kasparov 
implies untoward 
behavior by the 
Deep Blue team. 


Kasparov's reaction that proved the computer's 
intelligence according to Alan Turing’s classical 
definition of artificial intelligence. This recent 
chess match gave us an excellent example of a 
real-life Turing Test. At numerous press confer- 
ences during and after the match, Kasparov 
expressed doubts that he played against the com- 
puter itself. H eimplied that there was some unto- 
ward behavior by the Deep Blue team, saying 
that “| haveno idea what’s happening behind the 
curtain.” Kasparov also alluded to famous soc- 
cer player Diego Maradona, who allegedly 
scored a goal with his hand (as a postgame slow-motion 
film suggested), though it appeared (during the game) 
as if he had used his head. Kasparov stopped short of 
directly accusing the Deep Blue team of human inter- 
vention in the process of selecting moves, but went so 
far as to admit the appearance of human intelligencein 
the computer's actions. It did appear as if K asparov con- 
fused the computer with a human. 

Kasparov's suspicions were shared by some of his fel- 
low grand masters. But is it possible to draw a clear 
demarcation between a computer's and a human’s chess 
moves? Anjelina Belakovskaia isthe US Women’s Chess 
Champion, International Grand M aster, and an active 
proponent of human-computer cooperation in chess. 
She says it was easier to distinguish the moves of pre- 
vious versions of computer chess programs from human 
players because computers clearly paid much more 
attention to material rather than positional advantage. 
In contrast, the latest version of Deep Blue considered 
positional as well as material advantages, played much 
more aggressively, and for these reasons could be mis- 
taken for a human. 

Even with prior versions of chess, computers were 
difficult to distinguish from human chess players. In 
1991, Frederic Friedel, a chess journalist from 
Germany and the author of a popular chess program, 
conducted an informal version of the Turing Test in 
chess with Gary Kasparov. Kasparov's task was to 
identify a computer, Deep Thought (an earlier version 
of Deep Blue), by reviewing a database containing 
game records of a chess tournament. In a fairly ran- 
domized experiment, K asparov was able to identify a 
computer among eight players in 50 percent of the 
rounds. If in 1991 Gary Kasparov, an expert user and 
promoter of chess computers, barely passed this test, 
then in 1997 it seems that he failed to distinguish 
between human and artificial intellect. Did Deep Blue 
become the first computer to pass a Turing Test on 
artificial intelligence? It would seem so. 

f \ chess the area of human intellect most amen- 
able to computer simulation. He was also 


right in predicting that computers would demonstrate 


lan Turing was probably right in considering 


Computer 


near-human intellect by the end of the century. 
H owever, the evolution of computers since Turing’s 
time has tended to regard the two intellects as very 
separate— even adversarial. 

Anjelina Belakovskaia thinks it is time to stop test- 
ing computers and playing “us against them,” and 
instead begin using their power in collaboration. She 
is planning to participate in the first match against 
another human chess player where both players are 
allowed to use the help of computers as partners. 

The possibilities arising from collaboration among 
humans and computers are even moreintriguing than 
their differences were almost 50 years ago. | think 
Alan Turing would have agreed. + 


References 

1. R.M cFadden, “Inscrutable Conqueror,” TheN ew York 
Times, M ay 12, 1997, pp. 1. 

2. S. Hamilton and L. Garber, “Deep Blue's H ardware Soft- 
ware Synergy,” Computer, Oct. 1997, pp. 29-35. 

3. D. King, Kasparov vs. D eeper Blue: The Ultimate M an 
vs. M achine Challenge, Trafalgar Square, N orth Pom- 
fret, Vt., 1997. 

4. A. Condon, Computational M odels of Games, MIT 
Press, Cambridge, M ass., 1989. 

5. M. Garey and D. Johnson, Computers and I ntractabil- 
ity: A Guide to the Theory of NP-Completeness, W.H . 
Freeman, N ew York, 1979. 

6. A.M. Turing, “Computing M achinery and Intelligence,” 
A.P. Anderson, ed., Minds and M achines, PrenticeH all, 
Englewood Cliffs, NJ., 1964. 

7. HomepageoftheLoebne Prize 1999, http:/Avww.loebner. 
net/Prizef/loebner-prize html. (Current 28 Jan. 1999). 

8. Lessons from a Restricted Turing Test, 1999, http:// 
www.eecs.harvard.edu/shieber/papers/loebner-rev-html/ 
loebner-rev-html.html. (Current 28 Jan. 1999). 

9. N. Block, “The Computer M odel of the M ind,” D.N. 
Osherson and E.E. Smith, eds., An Introduction to Cog- 
nitive Science IIl: Thinking, MIT Press, Cambridge, 
M ass., 1990, pp. 147-289. 

Minsky Thread, 1999 http://www.loebner.net/Prizef/ 
minsky.html.(Current 28 Jan. 1999). 


10. 


Marina Krol is a research assistant professor at the 
M ount Sinai School of M edicine in N ew York. H er 
research interests include decision support systems 
and human-computer interaction. Krol hasa PhD in 
computer science from the City University of N ew 
York (CUNY). She is a member of the IEEE Com- 
puter Society. 


Contact the author at M arina Krol, PhD, Box 1010, 
The Mount Sinai School of M edicine, 1 Gustave L. 
Levy Place, N ew York, NY 10029-6574; marina_krol@ 
smtplink.mssm.edu. 


